Data Visualization - Static and Interactive Graphics using R

Brandon LeBeau

June 12, 2019

Workshop Logistics

About Me

  • I’m an Assistant Professor in the College of Education
    • I enjoy model building, particularly longitudinal models, and statistical programming.
  • I’ve used R for over 10 years.
    • I have 4 R packages, 3 on CRAN, 1 on GitHub
      • simglm
      • pdfsearch
      • highlightHTML
      • SPSStoR
  • GitHub Repository for this workshop: https://github.com/lebebr01/iowa_ds_graphics

Why teach the tidyverse

  • The tidyverse is a series of packages developed by Hadley Wickham and his team at RStudio. https://www.tidyverse.org/
  • I teach/use the tidyverse for 3 major reasons:
    • Simple functions that do one thing well
    • Consistent implementations across functions within tidyverse (i.e. common APIs)
    • Provides a framework for data manipulation

Static Graphics

Course Setup

## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## Registered S3 method overwritten by 'rvest':
##   method            from
##   read_xml.response xml2
## ── Attaching packages ──────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.1       ✔ purrr   0.3.2  
## ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Explore Data

PID county state area poptotal popdensity popwhite popblack 1 561 ADAMS IL 0.052 66090 1270.9615 63917 1702 2 562 ALEXANDER IL 0.014 10626 759.0000 7054 3496 3 563 BOND IL 0.022 14991 681.4091 14477 429 4 564 BOONE IL 0.017 30806 1812.1176 29344 127 5 565 BROWN IL 0.018 5836 324.2222 5264 547 6 566 BUREAU IL 0.050 35688 713.7600 35157 50 popamerindian popasian popother percwhite percblack percamerindan 1 98 249 124 96.71206 2.5752761 0.1482826 2 19 48 9 66.38434 32.9004329 0.1788067 3 35 16 34 96.57128 2.8617170 0.2334734 4 46 150 1139 95.25417 0.4122574 0.1493216 5 14 5 6 90.19877 9.3728581 0.2398903 6 65 195 221 98.51210 0.1401031 0.1821340 percasian percother popadults perchsd percollege percprof 1 0.37675897 0.18762294 43298 75.10740 19.63139 4.355859 2 0.45172219 0.08469791 6724 59.72635 11.24331 2.870315 3 0.10673071 0.22680275 9669 69.33499 17.03382 4.488572 4 0.48691813 3.69733169 19272 75.47219 17.27895 4.197800 5 0.08567512 0.10281014 3979 68.86152 14.47600 3.367680 6 0.54640215 0.61925577 23444 76.62941 18.90462 3.275891 poppovertyknown percpovertyknown percbelowpoverty percchildbelowpovert 1 63628 96.27478 13.151443 18.01172 2 10529 99.08714 32.244278 45.82651 3 14235 94.95697 12.068844 14.03606 4 30337 98.47757 7.209019 11.17954 5 4815 82.50514 13.520249 13.02289 6 35107 98.37200 10.399635 14.15882 percadultpoverty percelderlypoverty inmetro category 1 11.009776 12.443812 0 AAR 2 27.385647 25.228976 0 LHR 3 10.852090 12.697410 0 AAR 4 5.536013 6.217047 1 ALU 5 11.143211 19.200000 0 AAR 6 8.179287 11.008586 0 AAR

First ggplot

Equivalent Code

Your Turn

  1. Try plotting popdensity by state.
  2. Try plotting county by state.
    • Does this plot work?
  3. Bonus: Try just using the ggplot(data = midwest) from above.
    • What do you get?
    • Does this make sense?

Add Aesthetics

Global Aesthetics

Your Turn

  1. Instead of using colors, make the shape of the points different for each state.
  2. Instead of color, use alpha instead.
    • What does this do to the plot?
  3. Try the following command: colors().
    • Try a few colors to find your favorite.
  4. What happens if you use the following code:

Additional Geoms

Add more Aesthetics

Your Turn

  1. It is possible to combine geoms, which we will do next, but try it first. Try to recreate this plot.

Layered ggplot

Remove duplicate aesthetics

Your Turn

  1. Can you recreate the following figure?

Brief plot customization

Brief plot customization Output

Change plot theme

More themes

Base plot for reference

Add plot title or subtitle

Color Options

Using colorbrewer2.org

Two additional color options

viridis colors

viridis colors

Zoom in on a plot

Zoom in on a plot output

Zoom using scale_x_continuous - Bad Practice

Comparing output

## Warning: Removed 16 rows containing non-finite values (stat_smooth).
## Warning: Removed 16 rows containing missing values (geom_point).

Lord of the Rings Data

## Parsed with column specification:
## cols(
##   Film = col_character(),
##   Chapter = col_character(),
##   Character = col_character(),
##   Race = col_character(),
##   Words = col_double()
## )

View LOTR

## # A tibble: 6 x 5
##   Film                       Chapter                Character Race   Words
##   <chr>                      <chr>                  <chr>     <chr>  <dbl>
## 1 The Fellowship Of The Ring 01: Prologue           Bilbo     Hobbit     4
## 2 The Fellowship Of The Ring 01: Prologue           Elrond    Elf        5
## 3 The Fellowship Of The Ring 01: Prologue           Galadriel Elf      460
## 4 The Fellowship Of The Ring 02: Concerning Hobbits Bilbo     Hobbit   214
## 5 The Fellowship Of The Ring 03: The Shire          Bilbo     Hobbit    70
## 6 The Fellowship Of The Ring 03: The Shire          Frodo     Hobbit   128

Geoms for single variables

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Customize histogram

Customize histogram 2

Histograms by other variables - likely not useful

Histograms by other variables - one alternative

Your Turn

  1. With more than two groups, histograms are difficult to interpret due to overlap. Instead, use the geom_density to create a density plot for Words for each film.
  2. Using geom_boxplot, create boxplots with Words as the y variable and Film as the x variable. Bonus: facet this plot by the variable Race. Bonus2: Zoom in on the bulk of the data.

Rotation of axis labels

Many times coord_flip is better

Bar graphs

Add aesthetic

Stacked Bars Relative

Dodged Bars

Change Bar Col bar_coloror

Your Turn

  1. Using the gss_cat data, create a bar chart of the variable partyid.
  2. Add the variable marital to the bar chart created in step 1. Do you prefer a stacked or dodged version?
  3. Take steps to make one of the plots above close to publication quality.

Additional ggplot2 resources

Additional R Resources

Interactive Graphics

Why Interactive Graphics

  • Why interactive graphics?
    • Created specifically for the web.
    • Can focus, explore, zoom, or remove data at will.
    • Allows users to customize their experience.
    • It is fun!

Interactive graphics with plotly

First Interactive Plot

Customized Plot

Interactive Output

Your Turn

  1. Using the starwars data, create a static ggplot and use the ggplotly function to turn it interactive.

Create plotly by hand

Subplots Code

Subplots Output

Grouped bar plot

Plot of proportions code

Plot of proportions output

Your Turn

  1. Using the gss_cat data, create a histrogram for the tvhours variable.
  2. Using the gss_cat data, create a bar chart showing the partyid variable by the marital status.

Scatterplots by Hand

Change symbol

Change color

Line Graph

Your Turn

  1. Using the gss_cat data, create a scatterplot showing the age and tvhours variables.
  2. Compute the average time spent watching tv by year and marital status. Then, plot the average time spent watching tv by year and marital status.

Highcharter; Highcharts for R

Load highcharter

## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
## # A tibble: 6 x 3
##   Film                       Race       n
##   <chr>                      <chr>  <int>
## 1 The Fellowship Of The Ring Dwarf     11
## 2 The Fellowship Of The Ring Elf       31
## 3 The Fellowship Of The Ring Hobbit   103
## 4 The Fellowship Of The Ring Man       40
## 5 The Fellowship Of The Ring Orc        3
## 6 The Fellowship Of The Ring Wizard    29

hchart function

A second hchart

Histogram

Your Turn

  1. Using the hchart function, create a bar chart or histogram with the gss_cat data.
  2. Using the hchart function, create a scatterplot with the gss_cat data.

Build Highcharts from scratch

Build highcharts from scratch output

Change Chart type

Change Colors

Modify Axes

Add title, subtitle, move legend

Add title, subtitle, move legend output

Your Turn

  1. Build up a plot from scratch, getting the figure close to publication quality using the gss_cat data.

Correlation Matrices

Leaflet Example

gganimate

gganimate example

gganimate output

Additional Resources